Fraunhofer IAIS – V-Analytics

VAST 2011 Challenge
Mini-Challenge 1 - Characterization of an Epidemic Spread

Authors and Affiliations:

Gennady Andrienko, gennady.andrienko@iais.fraunhofer.de

Natalia Andrienko, natalia.andrienko@iais.fraunhofer.de

Katerina Vrotsou, katerina.vrotsou@iais.fraunhofer.de

Fraunhofer IAIS – Institute for Intelligent Analysis and Information Systems, Sankt Augustin, Germany

Tool(s):

V-Analytics (a.k.a. CommonGIS) is a system that has been developed at Fraunhofer IAIS since 1997. This is a generic system for analysis of different types of spatio-temporal data: spatially referenced time series, spatial events (physical or abstract entities localized in space and time), trajectories of moving objects, and flows (aggregate moves) among places. The system combines various types of data displays and interaction techniques with computational methods.  The display types include maps with a wide range of cartographic visualization techniques, space-time cube, time graph, parallel coordinates plot, scatter plot, and others. The interaction techniques include several types of dynamic filters: spatial, temporal, attribute-based, class-based, specific filter for trajectory segments, and some others. The computational methods include basic arithmetic computations on attribute values, geographic computations, various transformations of the data, such as aggregation or extraction of events, and interfaces to data mining methods, particularly, several methods of clustering. Nothing specific for the VAST challenge has been developed.

 

Video:

 

Open the embeded video in the browser.
or
Right click and downlad mp4 video: V-Analytics_MC1.mp4.

 

 

ANSWERS:


MC 1.1 Origin and Epidemic Spread: Identify approximately where the outbreak started on the map (ground zero location). If possible, outline the affected area. Explain how you arrived at your conclusion.

Outbreaks of two diseases occurred:

A)    disease with symptoms ‘flu’, ‘headache’, 'cough', 'fever', 'chills', 'sweat', 'fatigue', ‘short breath’, etc. started on 18/05/2011;

B)    disease characterized by stomach disorders (‘diarrhea’, ‘nausea’, ‘vomiting’, ‘stomach ache/pain’, ‘abdominal ache/pain’), started on 19/05/2011.

outbreak_areas.png

Figure 1. The locations of the ‘flu’ events (disease A) on 18/05/2011 are shown by yellow circles drawn with 30% opacity. The locations of the ‘diarrhea’ events (disease B) on 19/05/2011 are shown by cyan circles drawn with 60% opacity. The affected areas are outlined by yellow and cyan lines, respectively.

time_graph_sick_by_days+popups.png

Figure 2. The time graph represents the counts of the messages including keywords indicating sickness aggregated by days and districts and normalized by the daytime population of the districts. We see that the counts increased on 18/05. By mouse-pointing on the cross symbols, we see the most frequent words and phrases from the respective messages. Downtown: symptoms A. Smogtown: symptoms B.

 


MC 1.2 Epidemic Spread: Present a hypothesis on how the infection is being transmitted. For example, is the method of transmission person-to-person, airborne, waterborne, or something else? Identify the trends that support your hypothesis. Is the outbreak contained? Is it necessary for emergency management personnel to deploy treatment resources outside the affected area? Explain your reasoning.

Disease A

Hypothesis 1: Disease A is transmitted by wind (airborne).

Reasoning: Figures 1 and 3 show that on 18/05 the new disease occurrences form a dense cluster stretching from the center to the east, which corresponds to the wind direction (W on 17/05 and 18/05). This supports hypothesis 1.

Hypothesis 2: Disease A is also transmitted from person to person.

Reasoning: Figure 3 shows that on 19/05 and 20/05 the new disease occurrences were densely clustered in the center and also spread throughout the city. This does not correspond to the wind direction (WNW and NW). Figure 4 shows that the persons who got symptoms of disease A moved throughout the city after they had mentioned the illness for the first time. These persons could transmit the infection to others.

However, the people who first reported about the illness outside the outbreak area could have visited the area before that and could have got the illness through inhaling contaminated air. Figure 5 shows a very high density of people’s traces in the outbreak area. To check more precisely, we compute the distances to the outbreak area from the people’s positions before reporting the illness. We classify the trajectories by the minimum distance (Figure 6) and see that 92% of the ill persons had been within or in less than 0.01km from the outbreak area before the first reporting of the symptoms of disease A. Hence, they could have got the illness while visiting the outbreak area. The remaining 8% could also have visited the outbreak area without leaving their traces (messages); anyway, their proportion in the total number of the disease cases is small. Hence, there is not enough evidence in support of hypothesis 2, and it should be rejected.

The final conclusion is that disease A is airborne.

The outbreak is not contained in terms of the spatial distribution of the disease cases. However, the source of the disease is, most probably, contained in the outlined area. Hence, decontamination resources need to be deployed only in this area. Additional medical resources may be needed for treating the ill people outside the area, particularly, in the hospitals (Figure 4); however, the time graph in Figure 2 and the space-time cube in Figure 3 show a decreasing trend in occurrences of disease A.

STC_flu.png

Figure 3. Space-time cube (STC): two horizontal dimensions represent space; the vertical dimension, from bottom to top, represents time from 0 o’clock on 18/05/2011 till the end of 20/05/2011. The yellow dots show the spatio-temporal positions of the new disease A occurrences, i.e., the first message of each person where the symptoms are mentioned.

ttrajectories_after_flu.png

Figure 4. Figure 4. Parts of the trajectories of the ill persons starting from the first reporting of the disease A symptoms till the last message are shown with 3% opacity. The visible concentrations of end points of the trajectories are at the hospitals. The boundary of the outbreak area is shown in orange.

ttrajectories_before_flu.png

Figure 5. Parts of trajectories of the ill persons starting from 0 o’clock on 17/05 till the first reporting of the symptoms of disease A are shown with 3% opacity. Many persons seem to have visited the outbreak area before reporting the illness.

ttrajectories_before_flu_classes.png

Figure 6. The parts of the trajectories presented in Figure 5 have been classified according to the minimum distances to the outbreak area. Out of the 18098 ill persons, 16662 (92%) had been within or in less than 0.01km from the outbreak area before the first reporting of the illness. Their trajectories have been filtered out, to make the remaining 1436 trajectories better visible (20% opacity).

Disease B

Hypothesis: disease B is waterborne and not transmitted from person to person.

Support:

Figure 1 shows that the disease occurrences on the day of the outbreak (19/05) are clustered along the river, which flows southwards. Figure 7 shows that the new occurrences remained clustered along the river also on 20/05 and did not spread over the city. Hence, the outbreak is contained.

STC_diarrhea.png

Figure 7. The STC shows the spatio-temporal distribution of the new occurrences of disease B, i.e., the first message of each person mentioning the symptoms of disease B. The time period is from the beginning of the 18th till the end of the 20th of May.

Data preparation

The original data were put in an Oracle database. Using an SQL query, we attached a flag to each message denoting whether it contains any keywords indicating illness, such as ‘ache’ or ‘fever’. After discovering two different groups of symptoms (Figure 2), we created one additional flag for each group. Then we extracted the identifiers of the persons who had illness flags starting from 17/05/2011 (the day before the outbreak start). For these persons, we created trajectories by connecting the positions of the consecutive messages.

Possible cause of the outbreaks

The shapes of the outbreak areas (Figure 1) suggest that the origin of the air and water contamination might be near the crossing of the river by road 610, west of Downtown. To detect an event that might cause the outbreaks, we selected the messages including keywords “accident”, “explosion”, “fire”, “spill”, etc. posted during a week before 18/05 and aggregated them spatially (by smaller territory compartments than the districts, for better event localization) and temporally. Using a time graph like in Figure 2, we found when and where the message counts increased and extracted the frequent words and phrases from the respective messages. We detected: a plane crash at the airport on 13/05; a fire near Capital Building on 14/05; bomb threats and searching for bombs in Downtown on 16/05; explosions in Smogtown on 17/05; truck accident with big fire on road 610 close to the bridge on 17/05. The truck accident appears the most related to the outbreaks spatially and temporally. By looking at the messages in more detail, we found that it started at about 11 o’clock and that, besides the fire, some trucks were spilling their cargo. This can explain the air and water contamination.